Table recognition in mathematical documents

نویسنده

  • Mohamed A. Alkalai
چکیده

While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysis of mathematical formulas. Again, it is not straight forward to adapt general layout analysis techniques for mathematical formulas. However, a reliable understanding of formula layout is often a necessary prerequisite to further semantic interpretation of the represented formulae. In this thesis, we present the necessary preprocessing steps towards a table recognition technique that specialises on tables in mathematical documents. It is based on our novel robust line recognition technique for mathematical expressions, which is fully independent of understanding the content or specialist fonts of expressions. We also present a graph representation for complex mathematical table structures. A set of rewriting rules applied to the graph allows for reliable re-composition of cells in order to identify several valid table interpretations. We demonstrate the effectiveness of our technique by applying them to a set of mathematical tables from standard text book that has been manually ground-truthed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Recognition and Data Extraction Method for Table-Form Documents

In Asia, many documents processed in offices are table-form documents. Hence the automatic processing of table-form documents is an important issue of the office automation research. In this paper, we propose an efficient representation method for table-form documents. The representation method is based on three types of line segments. The line segments are normalized and sorted, hence the repr...

متن کامل

Recognising Tabular Mathematical Expressions Using Graph Rewriting

While a number of techniques have been developed for table recognition in ordinary text documents, very little work has been done on tables that contain mathematical expressions. The latter problem is complicated by the fact that mathematical formulae often have a tabular layout themselves, thus not only blurring the distinction between table and content structure, but often leading to a number...

متن کامل

An Adaptative Recognition System Using a Table Description Language for Hierarchical Table Structures in Archival Documents

Archival documents are difficult to recognize because they are often damaged. Moreover, variations between documents are important even for documents having a priori the same structure. A recognition system to overcome these difficulties needs an external knowledge. Therefore we present a recognition system using an user description. To use table descriptions in analyzing the image, our system ...

متن کامل

Image Registration and Text Recognition for Structured Census Documents

In this paper, we present our work on developing a system for registration and recognition of structured census documents. Information extraction from these documents present many challenges, for instance, table registration, cell extraction, binarization, and recognition of handwritten text. This paper mainly deals with table registration. It details the approach and algorithms we developed fo...

متن کامل

Extraction of Logical Structure from Articles in Mathematics

We propose a mathematical knowledge browser which helps people to read mathematical documents. By the browser printed mathematical documents can be scanned and recognized by OCR (Optical Character Recognition). Then the meta-information (e.g. title, author) and the logical structure (e.g. section, theorem) of the documents are automatically extracted. The purpose of this paper is to show the ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015